Text Classiication Using String Kernels Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150

نویسندگان

  • Huma Lodhi
  • John Shawe-Taylor
  • Nello Cristianini
  • Chris Watkins
چکیده

We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences which are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be eeciently evaluated by a dynamic programming technique. A preliminary experimental comparison of the performance of the kernel compared with a standard word feature space kernel 4] is made showing encouraging results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Latent Semantic Kernels for Feature Selection Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150

Latent Semantic Indexing is a method for selecting informative subspaces of feature spaces. It was developed for information retrieval to reveal semantic information from document co-occurrences. The paper demonstrates how this method can be implemented implicitly to a kernel deened feature space and hence adapted for application to any kernel based learning algorithm and data. Experiments with...

متن کامل

New Support Vector Algorithms Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150 Introduction 1

We describe a new class of Support Vector algorithms for regression and classiication. In these algorithms, a parameter lets one eeectively control the number of Support Vectors. While this can be useful in its own right, the parametrization has the additional beneet of enabling us to eliminate one of the other free parameters of the algorithm: the accuracy parameter " in the regression case, a...

متن کامل

Dynamically Adapting Kernels in Support Vector Machines Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150

The kernel-parameter is one of the few tunable parameters in Support Vector machines, and controls the complexity of the resulting hypothesis. The choice of its value amounts to model selection, and is usually performed by means of a validation set. We present an algorithm which can automatically perform model selection and learning with no additional computational cost and with no need of a va...

متن کامل

Discrete versus Analog Computation: Aspects of Studying the Same Problem in Diierent Computational Models Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150

In this tutorial we want to outline some of the features coming up when analyzing the same computational problems in diierent complexity theoretic frameworks. We will focus on two problems; the rst related to mathematical optimization and the second dealing with the intrinsic structure of complexity classes. Both examples serve well for working out in how far diierent approaches to the same pro...

متن کامل

Multiplicative Updatings for Support-vector Learning Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150

Support Vector machines nd maximal margin hyperplanes in a high dimensional feature space. Theoretical results exist which guarantee a high generalization performance when the margin is large or when the number of support vectors is small. Multiplicative-Updating algorithms are a new tool for perceptron learning whose theoretical properties are well studied. In this work we present a Multiplica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000